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ABSTRACT 

We develop a Bayesian hierarchical modelling approach for cosmic shear power spec¬ 
trum inference, jointly sampling from the posterior distribution of the cosmic shear 
field and its (tomographic) power spectra. Inference of the shear power spectrum is a 
powerful intermediate product for a cosmic shear analysis, since it requires very few 
model assumptions and can be used to perform inference on a wide range of cosmolog¬ 
ical models a posteriori without loss of information. We show that joint posterior for 
the shear map and power spectrum can be sampled effectively by Gibbs sampling, it¬ 
eratively drawing samples from the map and power spectrum, each conditional on the 
other. This approach neatly circumvents difficulties associated with complicated sur¬ 
vey geometry and masks that plague frequentist power spectrum estimators, since the 
power spectrum inference provides prior information about the field in masked regions 
at every sampling step. We demonstrate this approach for inference of tomographic 
shear i?-mode, i3-mode and EB-cross power spectra from a simulated galaxy shear 
catalogue with a number of important features; galaxies distributed on the sky and in 
redshift with photometric redshift uncertainties, realistic random ellipticity noise for 
every galaxy and a complicated survey mask. The obtained posterior distributions for 
the tomographic power spectrum coefficients recover the underlying simulated power 
spectra for both E- and B-modes. 

Key words: data analysis - weak lensing - gibbs sampling - wiener filter - messenger 
field - cosmology 


1 INTRODUCTION 


As light from distant galaxies propagates through the Uni¬ 
verse it is continuously deflected by the gravitational po¬ 
tential of the large-scale matter distribution, resulting in a 
coherent distortion of observed galaxy images on the sky. 
This weak gravitational lensing provides a powerful probe 
of the growth rate of potential perturbations and the geom¬ 
etry of the Universe through the distance-redshift relation. 
Weak lensing has unique appeal in its sensitivity to the full 
matter distribution in the Universe, allowing us to probe 
the 3D matter power spectrum over a range of scales and 
redshifts. Since weak lensing is a function of both the ge¬ 
ometry of the universe and the growth of structure, it is a 
particularly sensitive probe of dark energy and gravity on 


large scales (see e.g., Weinberg et al. 2013 and references 
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therein). Cosmic shear analyses in particular aim to extract 
cosmological information from weak lensing by measuring 
correlations of galaxy ellipticities, modified by the lensing 
field, across the sky (see Munshi et al.|[2008 for a compre¬ 
hensive review). The natural starting point for such an anal¬ 
ysis is to look at the two-point statistics of the shear field, 
although a substantial amount of information may also be 


available in the higher-order shear statistics ( 

Bernardeau, 

van Waerbeke & Mellier 19971 i^an Waerbeke, Bernardeau 

& Mellier|1999 

Schneider & Lombardi|2003 Takada & Jain 

2003 Vafaei et al.||2010 Kayo, Takada & Jain||2013l. This 


paper is concerned with extracting cosmological inferences 
from the two-point statistics of the cosmic shear field. 


When analysing the two-point statistics of a random 
field, we are free to choose the most convenient basis to 
work in. For example, working in the pixel basis the two- 
point function of the shear field is the real-space correla¬ 
tion function, whilst if we choose to work in harmonic space 
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the two-point function is the angular power spectrum. The 
power spectrum has a clear advantage over the correlation 
function; due to the statistical isotropy of the shear field, its 
spherical harmonic coefficients are uncorrelated and hence 
the covariance matrix of the field in this basis is sparse. The 
covariance of the real-space shear field on the other hand is 
not sparse, since the correlation function is non-zero over all 
scales. This is a major advantage of using the power spec¬ 
trum over the correlation function in cosmic shear analyses 
and will become increasingly important for larger-area weak 
lensing surveys. 

Inference of the shear power spectrum is an incredi¬ 
bly powerful intermediate product for a cosmic shear anal¬ 
ysis, since it can be used to perform inference on a wide 
range of cosmological models a posteriori without loss of 
information. Power spectrum inference is almost model in¬ 
dependent, assuming only that the field under inspection is 
statistically isotropic. In this study we further assume that 
the field is well-described by Gaussian statistics, but also 
note that the Gaussian distribution constitutes the maxi¬ 
mum entropy prior once the mean and covariance are spec¬ 
ified, so from a Bayesian perspective assuming a Gaussian 
distribution for the field may be a well-justified (although 
sub-optimal) approximation even on small scales where the 
shear held is non-linear. Since a wide range of cosmological 
models provide a deterministic relationship between the cos¬ 
mological model parameters and the power spectrum, cos¬ 
mological parameter inference can be performed a posteriori 
from the posterior distribution of the power spectrum given 
the data, without loss of information. This is clearly prefer¬ 
able to performing inference for each model independently 
from the full data-set and allows for efficient analysis of fu¬ 
ture cosmological models without having to re-analyse the 
full data set from scratch. 

In this paper we develop a Bayesian method for inferring 
the cosmic shear power spectrum by jointly sampling from 
the posterior distribution of the shear map and power spec¬ 
trum given the data, in a hierarchical fashion. A similar ap¬ 
proach has been developed for analysing cosmic microwave 


background (CMB) temperature maps ( 

Wandelt, Larson & 

Lakshminarayanan|2004| lEriksen et al.|2004||0’Dwyer et al. 

2004 

Chu et al.|2005 

Komatsu et al.|2011 

Ade et al.|2014 1, 

CMB polarization (Larson et al. 2007 

ilriksen et al.||2007 

Komatsu et al. 2011 

Karakci et al. 20131 and large-scale 

structure (Jasche et al.|2010 Jasche & Wandelt |20 12 20131. 


Gosmic shear bears some similarities and some important 
differences to the CMB and large-scale structure inference 
problems. The shear field is a 3D random field with spin- 
weight-2 on the angular sky. Shear is generally assumed to 
be a statistically isotropic field, but since lensing is an inte¬ 
grated effect it is not statistically homogeneous, in the sense 
that its statistical properties evolve strongly with redshift. 
On large scales, the shear field is Gaussian to a good approx¬ 
imation, whilst on small scales it becomes significantly non- 
Gaussian. As such, weak lensing combines together many of 
the key features seen in the context of CMB temperature, 
polarization and LSS power spectrum inference problems, 
whilst the inhomogeneity of the shear field adds an addi¬ 
tional new feature; in this sense weak lensing power spec¬ 
trum inference is a particularly rich statistical problem. 

One of the main challenges in estimating the power 
spectrum from weak lensing survey data is accounting for 


complicated survey geometry due to both incomplete sky 
coverage and masked regions within the survey area. The 
problem stems from the fact that Fourier and spherical har¬ 
monic basis functions are not orthogonal on the cut sky, 
which can result in masks moving power from the angular 
scale of the masks to other parts of the power spectrum and 
leakage between E- and B-mode power. These problems are 
a major inconvenience for approximate power spectrum esti¬ 
mation methods such as the pseudo-G^ (see e.g., 
|2004| and |Brown, Castro fc Taylor|2005| and also 
for an estimator approach that avoids some of these issues). 
Fortunately the approach presented here bypasses these dif- 
hculties completely. By estimating the map and power spec¬ 
trum simultaneously in a block-MCMC or Gibbs sampling 
framework, we iteratively sample from the map conditional 
on the power spectrum and the power spectrum conditional 
on the map. When sampling from the conditional distribu¬ 
tion of the map with a fixed power spectrum, even though 
the data provides no information about the masked regions 
the power spectrum still provides (probabilistic) information 
about the field in those regions. Masked regions are treated 
as pixels with inhnite noise, but the inferred power spectrum 
(combined with inference of pixels surrounding the masked 
region) nonetheless informs us about the field in the masked 
regions, circumventing the need to treat masked regions as 
being cut from the analysis and simplifying the survey ge¬ 
ometry. 


Chon et al. 
Smith|2006 


Drawing inferences about cosmology from weak lens¬ 
ing survey data is a challenging task, involving a number of 
complex modelling elements. Measuring galaxy shapes and 
redshifts from pixelized images and photometric data re¬ 
quires detailed models for the telescope point-spread func¬ 
tion (PSF), seeing effects, pixel noise and other instrumen¬ 
tal effects, as well as models for the intrinsic distributions of 
galaxy properties which determine their physical and photo¬ 
metric appearance. Cosmological parameter inference from 
observed galaxy shapes and redshifts requires a model re¬ 
lating the cosmology (statistically) to the cosmic shear field 
and in turn its impact on observed galaxies. Formulating the 
weak lensing inference task as a global hierarchical model is 
an attractive approach for a number of reasons. Hierarchi¬ 
cal models account for the full statistical interdependency 
structure of all model components and allow information to 
flow freely from raw pixel and photometric data through to 
cosmological inferences; this is optimal in the sense that no 
information is lost, and principled in the sense that param¬ 
eter uncertainties are propagated correctly and completely 
throughout the analysis (see Schneider et al.|2015 for a dis¬ 
cussion). However, whilst it is the optimal approach in prin¬ 
ciple, in practice performing a global analysis on a data set 
as large and complex as a weak lensing survey is a formidable 
challenge and a global analysis may not be computation¬ 
ally feasible. The alternative approach is to break the global 
problem up into a number of sub-problems which are anal¬ 
ysed in a series of steps, where the output of each step is 
used as input for the next. Whilst this has the advantage 
that sub-problems are computationally easier to solve, it is 
sub-optimal in the sense that full parameter interdependen¬ 
cies are not accounted for and it is challenging to propa¬ 
gate uncertainties consistently throughout the pipeline, in¬ 
troducing biases which must be carefully corrected for. The 
hierarchical modelling approach to map-power spectrum in- 
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ference developed here will form a central part of any larger 
global hierarchical model for weak lensing, or alternatively 
Bayesian map-power spectrum inference can be performed 
in isolation from e.g., a catalogue of measured galaxy shapes 
and redshifts, once instrumental effects, ellipticity distribu¬ 
tions etc have been modelled and accounted for a priori. 

The structure of this paper is as follows: in ii we dis¬ 
cuss the global hierarchical modelling approach to cosmic 
shear inference and isolate the map-power spectrum infer¬ 
ence problem in §2.1[ In §2.2| we specialize to tomographic 
cosmic shear and in we develop the Gibbs sampling ap¬ 
proach for joint shear map-power spectrum inference. In !|^ 
we describe the shear simulations and we demonstrate the 
method by recovering tomographic E- and B-mode shear 
power spectra from a simulated shear catalogue in !|^ We 
conclude in 


2 WEAK LENSING AS A HIERARCHICAL 
MODELLING PROBLEM 

The statistical task in weak lensing can be summarized as 
follows: given noisy, PSF convolved, pixelized images and 
photometric data for a large number of galaxies, how do we 
make inferences about cosmological models and parameters? 
The full inference problem involves modelling a number of 
processes, which can be broadly split into three groups: (1) 
How is the shear held (statistically) related to the cosmo¬ 
logical model/parameters? (2) What are the distributions of 
galaxy properties that characterise their appearance, both 
those which change under lensing (e.g., shape, size, appar¬ 
ent magnitude etc) and those which do not (e.g., Sersic 
index, colour etc), and what is the distribution of galaxy 
redshifts? (3) What are the telescope PSF, noise properties 
and other instrumental effects that generate noisy, pixelized, 
PSF convolved images and photometric data from galaxies 
with given physical characteristics and redshifts? In princi¬ 
ple we can write down a hierarchical model combining all 
of these processes and solve the global inference problem. 
In this section we outline the global hierarchical modelling 
approach for weak lensing and highlight the key advantages 
and challenges. This work complements that of |Schneider| 
et al. (20151, who investigated in some detail the sections 


of the hierarchy concerned with pixel data and galaxy prop¬ 
erties. Here we focus on a different part of the hierarchy, 
starting with point estimates of shear, and ending with the 
shear power spectra. A global statistical model would in¬ 
clude both. 

In order to write down a global hierarchical model for 
weak lensing let us first define a set of parameters that the 
model could include. A basic but comprehensive set of model 
parameters could be defined as follows: cosmological param¬ 
eters 0, the shear field s, a set of physical galaxy properties 
(size, shape, magnitude etc) {g} and redshifts z for each 
galaxy, a set of parameters {^} characterizing the distri¬ 
butions of intrinsic galaxy properties and redshifts, a set 
of parameters {x} characterizing the distribution of PSF 
parameters and pixel noise and a set of parameters {n} 
characterizing the effective PSF and noise for different pho¬ 
tometric bands, epochs, positions on the sky etc. The data 
are pixelized images for each galaxy dpix, photometric data 
for each source (on which the photometric redshift inference 


PSF, instrumental noise 


cosmology galaxy 

characteristics 



Figure 1. A generative forward model for weak lensing pixel 
and photometric data: cosmological parameters 0 , a set of pa¬ 
rameters {4} characterizing the distributions of physical galaxy 
characteristics and redshifts and parameters {x} characterizing 
the distribution of PSF and instrumental noise properties are re¬ 
alised from their respective prior distributions. The cosmology 
then generates a realization of the shear field s, a set of physi¬ 
cal galaxy properties {g} and redshifts z are generated for each 
galaxy from their specified distributions, and a set of effective 
PSFs and instrumental noise properties (for each band, epoch etc) 
{n} are realised given their intrinsic distributions characterized 
by {x}' Pixelized images for each galaxy dpix a^re then realised 
given the galaxy characteristics, redshifts, instrumental PSF and 
noise properties, photometric data Zpj, (on which photometric 
redshift inference is based) are generated given the true redshifts, 
and some auxiliary data daux providing additional information 
about the instrumental properties is realised given the PSF and 
noise properties. Note that this is by no means the most general 
model and might straightforwardly be extended to include addi¬ 
tional parameter and data interdependencies, additional model 
parameters, hyperpriors and additional data products. 


is based) Zph and some auxiliary data providing additional 
information specifically about the PSF and pixel noise prop¬ 
erties daux. Note that the discussion in this section is quite 
general and the field s refers to the full 3D shear field; in §2.2| 
we specialise to tomographic shear where s will thereafter 
refer to the set of 2D tomographic shear fields. 

A particularly elegant and useful way of visualizing for¬ 
ward hierarchical models is as a directed, acyclic bipartite 
graph, where model parameters and data (nodes, repre¬ 
sented by white circles) are connected via conditional prob¬ 
ability distributions (represented by orange boxes), for ex¬ 
ample Fig. [^(described in detail below). This representation 
clearly elicits the conditional structure of the inference prob¬ 
lem; parameters and data which are directly connected (via 
a single conditional density) are dependent, whereas param¬ 
eters and data which are not directly connected are con¬ 
ditionally independent. The posterior distribution for the 
full set of model parameters is straightforwardly obtained 
in a readily factorized form, accounting for the full condi- 
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tional structure of the problem, by taking the product of all 
distributions appearing in the graph (upto a normalization 
constant). Furthermore, the distribution of a single parame¬ 
ter node conditional on all others is given by the product of 
the conditional densities on all incoming and outgoing edges 
for that parameter (again upto a normalization constant). 
Each conditional probability distribution can be thought of 
as a separate modelling step; the hierarchical model thus 
breaks up the global problem into a number of sub-models, 
where one only needs to be able to write down conditional 
distributions for the various sub-sets of model parameters 
with all other parameters held fixed. Whilst the hierarchi¬ 
cal model describes the global inference problem, it is still 
modular in the sense that the model neatly factorizes into a 
set of sub-problems that can be attacked in turn. 

A generative forward model for the data and model as¬ 
sumptions described in the previous paragraphs is given in 
Fig.[T] and can be understood as follows: In the top level, 0, 
{^} and {x} are drawn from their respective priors. In the 
second level, the cosmology generates a shear field s, a set 
of intrinsic galaxy properties {g} and redshifts is generated 
from P(z, {g}|{^}), and the PSFs and pixel noise charac¬ 
teristics are generated from P({II}|{x})- Finally, the pixel 
values are generated given the galaxy positions and physical 
properties, the shear field, PSFs and pixel noise characteris¬ 
tics, the auxiliary data is generated from {11}, and the pho¬ 
tometric redshift data is generated from the true redshifts. 
This model is by no means the most general and can easily 
be extended to include further levels of sophistication, such 
as additional parameter and data interdependencies, addi¬ 
tional model parameters, parameterized hyperpriors and ad¬ 
ditional data products. 

The hierarchical model in Fig. [^clearly shows the sta¬ 
tistical interdependencies between the various parts of the 
model. Ideally we would like to solve the global inference 
problem, simultaneously inferring all of the model parame¬ 
ters given the data and marginalizing over all latent param¬ 
eters that are not of direct interest. This approach correctly 
accounts for the complicated web of interdependencies and 
allows information to flow unimpeded from the data through 
to the cosmological parameter inference; it is optimal in the 
sense that no information is lost, and principled in the sense 
that the uncertainties in all parameters are correctly and 
completely propagated throughout the analysis. In contrast, 
the frequentist approach typically analyses each part of the 
model in a series of consecutive steps, where the results of 
each step are used as inputs for the next. This makes it chal¬ 
lenging to both correctly account for the full statistical in¬ 
terdependency and to propagate uncertainties consistently, 
leading to biases that must be carefully (and painstakingly) 
corrected for. In spite of being a global analysis, the hierar¬ 
chical approach is in fact naturally modular; the Gibbs or 
block-MCMC sampling scheme separates the various steps 
in the inference process, iteratively dealing with each com¬ 
ponent in turn. Whilst the typical frequentist approach es¬ 
timates fixed values for parameters at each step and feeds 
them into the next, the hierarchical approach feeds the full 
probabilistic inference about each parameter throughout the 
analysis. 

In principle, it is possible to perform an (asymp¬ 
totically) optimal frequentist analysis, writing down the 
global multi-parameter likelihood and performing a joint 


maximum-likelihood analysis for all model parameters from 
the global likelihood. However, estimating parameter uncer¬ 
tainties accurately in a high-dimensional multi-parameter 
maximum-likelihood analysis is expected to be consider¬ 
ably more computationally challenging than implementing a 
Markov chain for the equivalent Bayesian inference problem, 
given a hxed data set. The Bayesian hierarchical modelling 
approach to weak leasing has clear advantages over both the 
sub-optimal frequentist approach, analysing each part of the 
model in series (as described above), and the more careful 
global maximum-likelihood analysis. 

Given the scale and complexity of the weak leasing in¬ 
ference problem described in Fig. a sensible approach to 
developing a global analysis pipeline is to temporarily break 
up the global model into a number of sub-models (for exam¬ 
ple, the three columns in Fig. [^, allowing us to tackle the 
technical challenges associated with different parts of the hi¬ 
erarchy in isolation. These can later be re-united to build as 
global an analysis pipeline as possible. 


2.1 Isolating the map-power spectrum inference 
problem 

In this paper we are interested in isolating the central col¬ 
umn of the hierarchy in Fig. [^(highlighted in blue), i.e., ex¬ 
tracting cosmological parameter inferences from weak leas¬ 
ing data via the cosmic shear held. Rather than attempt 
to infer cosmological parameters directly, we would rather 
like to infer the shear power spectrum given the data. Cos¬ 
mological models provide a deterministic relationship be¬ 
tween cosmological parameters and the shear power spec¬ 
trum. This means that cosmological parameter inference can 
be performed a posteriori for a wide range of models directly 
from the posterior distribution for the shear power spectrum 


given the data, without loss of information (see (3.3 for de¬ 
tails). As such, inference of the shear power spectrum is a 
very useful intermediate product and is preferable to do¬ 
ing cosmological parameter inference directly from the data 
for the various models of interest separately. With this in 
mind we isolate and tackle the following sub-problem: given 
some (pre-processed) weak leasing data products, we want 
to jointly infer the shear held and its power spectrum. Fur¬ 
thermore, we will assume the shear held is Gaussian and 
fully characterized by its two-point statistics, i.e., its power 
spectrum (covariance matrix) C. This reduced problem is 
summarized by the forward model in Fig. and can be un¬ 
derstood as follows: We begin by specifying a prior for the 
power spectrum P{C) which generates a power spectrum C. 
This power spectrum then generates a shear field s via the 
density P(s|C), which we take to be a zero mean Gaussian 
with covariance C. We then hx the noise covariance matrix 
N and add noise to the realised shear map to give a real¬ 
isation of the data, a noisy estimate of the shear held, via 


the conditional density P(djs,N). In (2.2 we specialize to 
tomography, and C and s will be understood to denote the 
tomographic power spectra and tomographic fields respec¬ 
tively. 

Since the left and right hand sides of the hierarchy from 
Fig.[T] have been removed in the reduced problem, we are 
forced to use as our data vector some processed version of 
the raw pixel data dpix for which the instrumental effects, 
distributions of galaxy properties and photometric redshifts 
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Figure 2. Hierarchical forward model for noisy pixelized shear 
maps d from the tomograhic shear power spectra C: the shear 
power spectrum C is drawn from some prior distribution, a real¬ 
ization of the tomographic shear fields s are then generated given 
the power spectra, and finally noisy tomographic shear maps d 
are realised by adding noise with covariance N. 


shear field; by restricting ourselves to a tomographic analy¬ 
sis without modelling the fluctuation of the field within the 
redshift bins, we cannot forward model a catalogue of indi¬ 
vidual galaxy shapes, angular positions and redshifts. It is 
possible, however, to write down a forward model for the 
average ellipticities of sources binned in redshift at angular 
positions on the sky with reference only to the tomographic 
shear fields. Therefore we are forced to process the cata¬ 
logue of galaxy shapes and positions further into pixelized 
2D maps of the average shapes in each pixel for sources in 
each redshift bin. This demonstrates clearly that a tomo¬ 
graphic analysis is sub-optimal on two counts: information 
is lost in compressing the data from a full 3D catalogue of 
galaxy shapes to a collection of averages (since the detailed 
redshift dependence of the power spectrum contains cosmo¬ 
logical information), and secondly in a tomographic analy¬ 
sis it is not possible to include the full interdependence of 
the shear field and galaxy redshifts shown in Fig. The 
formalism developed in this work can be straightforwardly 
extended to perform a fully 3D shear analysis (albeit at ad¬ 
ditional computational cost) and we will investigate this in 
future work. 

The technical details of the processed data vector d, 
field s, noise and signal covariances N and C for a tomo¬ 
graphic shear analysis are described in detail in § 2 . 2 . 1 | and 
l 2 . 2 . 2 l below. 


have already been modelled and accounted for a priori. We 
take our pre-processed data to be a catalogue of measured 
galaxy shapes, angular positions and photometric redshifts, 
where the effects of {x}i {n}, {^}, {g} and z have been ac¬ 
counted for in the shape and photometric redshift inference 
process. In the next section we specialize the problem fur¬ 
ther to a tomographic cosmic shear analysis; we will see that 
this requires the data to be compressed further by grouping 
sources into a number of redshift bins and angular pixels 
on the sky. The resulting data vector will be a set of noisy 
pixelized shear maps for multiple tomographic redshift bins. 


2.2 Tomographic cosmic shear 


Ideally we would like to analyse the full 3D shear field, and 
formalism for performing 3D cosmic shear analysis is now 


well developed (Heavens 2003 Castro, Heavens & Kitch 


ing|2005| Heavens, Kitching & Taylor 

2006 

Kitching et al. 

2007| Kitching, Taylor & Heavens|2008 

|Kitc 

ling. Heavens & 


duction in technical difficulty can be achieved whilst retain¬ 
ing a large fraction of the cosmological information by per¬ 
forming a tomographic rather than fully 3D analysis (e.g.. 


Hu 19991. Here sources are separated into a number of tomo¬ 


graphic redshift bins and we analyse the shear field averaged 
over these slices in redshift, i.e., we reduce the 3D shear field 
7 (r) to a set of 2D fields { 7 ^“^ ( 6 , ()i)}, where a denotes the 
redshift bin. This collection of 2D fields will form a field vec¬ 
tor s with covariance C (i.e., the tomographic shear power 
spectra). 

A forward model for the shapes of galaxies at points in 
3D space necessarily requires some reference to the full 3D 


2.2.1 The data vector 

Tensing can be observed through a change in observed 
galaxy ellipticities. In the weak lensing limit, observed (com¬ 
plex) galaxy shapes e are modified from their intrinsic un- 
lensed values eo by the complex shear field 7 = 71 -I- 172 , 
according to. 


£ = eo+7- (1) 

Under the assumption that (eo) = 0, the observed elliptici¬ 
ties provide a simple unbiased point estimator for the shear. 
We would like to build estimated (noisy) pixelized maps of 
the shear field in a number of tomographic redshift bins 
from a catalogue of galaxy shapes, angular positions and 
redshifts; we can estimate the shear in a pixel p averaged 
over redshift bin a by averaging the galaxy ellipticities in 
that pixel, 

^ ^ ^ 9 ’ ( 2 ) 

' V g in pixel p, bin ce 

where is the number of sources in pixel p and redshift 
bin a. This provides an unbiased estimator for the tomo¬ 
graphic shear field, i.e., the 3D shear 7 ( 0 , (j), z) averaged over 
the pixel p and the galaxy redshift distribution for galaxies 
in redshift bin a, 

f / 'yC,<l>,z)p(a){z)dz, (3) 

where denotes solid angle, ADp is the solid angle of pixel 
p, 'y{0,(j>,z) is the full 3D shear field and p(a){z)dz is the 
redshift distribution for sources in redshift bin a (normalized 
to one over the bin). We hence assume a linear model for 
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■' (*^) 

7p 7 


ip ip 


+ €. 


(c) 



(4) 


where is the variance of intrinsic galaxy ellipticities (per 
component), A/'(p, a) denotes the Gaussian distribution with 
mean fi and variance and we are assuming that are 
uncorrelated between pixels (although this assumption can 
be straightforwardly lifted). Note that even if the ellipticity 
distribution is not Gaussian, provided many sources con¬ 
tribute to each pixel average the noise will become Gaussian 
according to the central limit theorem. 

The tomographic shear fields are complex with two real 
components, ; we take our data vector d 

to be composed of the two components of the complex shear 
for every pixel and redshift bin: 

d-fA(l) 7 ( 1 ) a( 2 ) .c( 2 ) 

^ t T2,p=n 7i.p=n 72 ,p=ii ■ ■ ■ 1 

7 ( 1 ) x(i) .^( 2 ) ~{ 2 ) . , 5 ^ 

/l,p=2> /2,p=2i a,p=2i l2,p=2i-'-h V'-’l 

Under the assumptions described above, the data vector is 
described by a linear model d = s -|- n where the field s is 
the collection of tomographic shear maps (whose statistics 


are described in ) 2.2.2 I and the noise n has covariance. 


(nn^) = N = diag 


^''p=i 






^''p=i 




N, 


( 1 ) 


p=2 


N, 


( 2 ) 


N, 


( 2 ) 


p=2 


For masked pixels there are no observed sources contributing 
to so the noise covariance for these pixels is taken to 
be infinite. 


considerations require ~ 0. However, systematic ef¬ 

fects could give rise to non-zero B-modes, so in a weak lens- 
ing analysis the estimation of the B-mode power is nonethe¬ 
less useful as it provides a test for systematic effects. 

In the Limber approximation (Limber 19541, the E- 


mode tomographic shear power spectra are given by (Kaiser 
T9^ [T9981 [HulpOT] [ 200 ^ [Takada fc Jain|2004] , 


^EE f 

— / 


dx 

Xm(x) 


W(c){x)'W(i3){x)A + zfPs 


Xm(x) 


;x 


( 8 ) 


where y is comoving distance, B(fc; x) is the 3D matter 
power spectrum and Xm(x) i® the transverse comoving dis¬ 
tance corresponding to comoving distance X- The lensing 
weight functions W(ct)ix) ^re given by 


qn H '2 rXH 

«’(.)(X) = ^^Xmix) J dx' 


/^Xm(x' - X) 


Xm(x') 


(9) 


where n(a){x)dx ~ P(a)iz)dz is the redshift distribution for 
galaxies in redshift bin a (normalized to one over the bin). 

The fields under consideration are the collection of to¬ 
mographic shear maps { 7 ’'“^( 6 ', (j))}. In the context of the hi¬ 
erarchical model in Fig.[^ the held s represented in harmonic 
space contains the set of harmonic coefhcients } 

arranged into a vector: 

s = (soo, Sl_l, Sio, Sll . . . Stm ■ ■ ■ ) , 


\ 



^E(nbins 


^T(2) 

B(ni-,ins)\ 

(6) 

S^m — [Xlrn ■ 

' 7.^m 5 ■ ' 

' • ’ llm 

5 7^771 ■ 

' Tim > • 

■ • 7£m ) ■ 

(10) 


The full covariance matrix C of the held s will then be block- 
diagonal, where each Km-mode contributes one block Cim, 

(ssf = C = diag (Coo, Ci_i, Cio, Cn ... Cim . •.), 

= diag(Co,Ci,C 2 ...C^...), (11) 


2.2.2 The signal: tomographic shear fields and their 
covariance 


The tomographic shear helds described in § are two- 
dimensional isotropic random helds with spin-weight -2 on 
the angular sky. Since the helds are isotropic, their (spin- 
2 ) spherical harmonic coefhcients are uncorrelated, making 
harmonic space a particularly convenient basis. The expan¬ 
sion coefhcients and two-point statistics of the complex shear 
(split into E- and B-mode components) are given by 

= 11 [ 7 '“n‘/>) 2 T;^(^)+ 7 *^“H‘A)- 2 T/^(^)] dn, 

7 fi“’ j - 7*^“^(‘/>)-2’F/m(«/>)] dXl, 

/ E(q:)* E(/3)\ /-yEE r r 

~ ■> 

/ E(q:)* B(/3)\ /->EB r r 

j B(q:)* B(^)\ /^BB r r /^\ 


where ±2 Dm are the spin-weight ±2 spherical harmonics, 
Cfjap, and Cf^fp are the B-mode, B-mode and cross 

EB angular power spectra between tomographic bins a and 
P and &nm is the Kronecker-delta. Typically, cosmological 
models predict negligible B-mode, so C^^p ~ 0, and parity 


where C/ = C^m < 8 ) T. 2 i+i is the block diagonal contribution 
for a given i mode, with 2 K -|- 1 diagonal sub-blocks for each 
m mode at the given K, 


/^EE 

^1,11 

^EE 

^1,12 

^EB 

^^,11 

/^EB 

'^£,12 

/^EE 

^1,21 

/~fEE 

^^,22 

^EB 

^£,21 

/^EB 

^^,22 

^BE 

^BE 

^£,12 

^EB 

^£,11 

y^BB 

^£,12 

/^BE 

^^,21 

riBE 

^£,22 

y^BB 

^£,21 

riBB 

^£,22 


V 


/ 


I„ is the n X n identity matrix, ® is the Kronecker product 
and again Ce,ai 3 are the tomographic angular power spectra 
between redshift bins a and fi. Note that, in principle, the 
shear covariance could contain contributions from both cos¬ 
mic shear (as described above) and also intrinsic alignments; 
see §3.4| for a discussion of including intrinsic alignments into 
this framework. 


2.2.3 Flat sky approximation 

In the limit where we have a small survey area, we can make 
the fiat sky-approximation and replace spherical harmonic 
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transforms with Fourier transforms. In this limit, the E- and 
B-mode shear coefficients and power spectra are given by: 


E(q!) 

7£' 


B(a) 

i 

~ ~2 ' 


/ E(a' 

( 7 / 

I*7E(/3)) 

^EE r 

( 7 / 

I*7B(/3)) 

^EB c 

( 7 / ' 

i*7B(«) 

^BB r 


(13) 


where £ = {£x,£y), the phase factor ipi = —{t\ — Py -\- 
2ilx£y)lP and the angular power spectra are as previously. 
The field vector for the flat-sky approximation shear coeffi¬ 
cients is then given by 


S — (S£j , 8^2,8^3,... . ...), 


S£ 




E(„bi„,) B{1) B(2) 

5 /£ 5 7 £ ? /£ ) 






(14) 


The full covariance matrix C of the field 8 will again be 
block-diagonal, with each £-mode contributing one block, 

(88^) = C = diag (C,,, C, 2 , C, 3 , C,, ... C,. ...), 

= diag(Co,Ci,C 2 ...C^...), (15) 


where here = Ce^iy (g) I„ is the block diagonal contribu¬ 
tion from each £ mode, and are the n sub-blocks for 

each of the I = {lx,£y) modes with \£\ = £, 


/ /-.EE 
/ ^^11 
/-.EjE 
^£,21 

/-.EE 

^£,12 

/-.EE 

^£,22 

/-.EB 
^£,11 
/-.EB 
0£ 21 

/-.EB 

^£,12 

/-.EB 

^£,22 

... \ 

/-.BE 

^£,11 

/-.BE 

^£,21 

/-.BE 

^£,12 

/-.BE 

^£,22 

/-.EB 

^£,11 

/-.BB 

0 ^ 21 

/-.BB 

^£,12 

/-.BB 

^£,22 



V 



3 HIERARCHICAL TOMOGRAPHIC SHEAR 
MAP-POWER SPECTRUM INFERENCE 

In the following section we develop the machinery for jointly 
inferring the tomographic shear fields s and covariance 


(power spectra) C as described in [ 2.2.2 from observed 


noisy tomographic shear maps d described in j ]2.2.1| How¬ 
ever, the formalism developed here applies to the more gen¬ 
eral problem of jointly inferring any Gaussian held and its 
covariance given a noisy estimate of the held and the data is 
described by a Gaussian linear model, i.e., d = s -|- n, where 
n is Gaussian noise of known covariance N. The methods 
described in this section are similar to approaches taken 


to power spectrum inference for GMB temperature ( 

Wan- 

delt, Larson & Lakshminarayanan|2004| |Eriksen et al. 

2004 

O’Dwyer et al. |20041 |Ghu et al.| 20051 Komatsu et al. 

2011 

Ade et al.|2014 

), GMB polarization 

Larson et al.|2007 

Erik- 

sen et al. 2007 

Komatsu et al. 2011 

Karakci et al.|2013 1 and 

large-scale structure (Jasche et al. 

2010 Jasche & Wandelt 


lensing. 
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3.1 Vanilla map-power spectrum inference and 
the Wiener filter 


The basic forward model for the tomographic shear maps 
is described in Fig. and is understood as follows: We be¬ 
gin by hxing a power spectrum (chosen from some prior). 
This power spectrum generates a shear held via the density 
P(s|C) (which is Gaussian under the current assumptions). 
We then specify a noise covariance matrix N and add noise 
to the realised shear map to give a realisation of the data, via 
the conditional density P(d|s, N). The graph in Fig.j^shows 
us explicitly the conditional structure of the full posterior 
P(C,s|d), since we can simply write the posterior as the 
product of the conditional densities (and priors) appearing 
on the edges of the graph: 


P(C,sid,N) 


P(dls,N)P(s|C)P(C) 

W) 


(17) 


where the conditional densities are given by 


P(d|s,N) 
P(s|C) = 


_ _ t _p-i(d-s)'^N ^(d-s) 

V( 2 ^)^|N| 

^- 4 stc~^s 

^/(2nW\C\ 


(18) 


and N = 2x ribins x Upix is the length of the vectors d and s 
for ribins tomographic shear maps each containing ripix pixels 
(and the factor of 2 is due to including both E- and P-mode 
degrees-of-freedom). For the prior on the covariance matrix 
we take a Jeffreys prior | Jeffreys|1961 1 on each sub-block of 
the covariance. 


P(C) (19) 

e 


where the sub-blocks Ce are pxp matrices with p = 2 xribins■ 
For a discussion of non-informative priors for covariance 
matrix inference, see Daniels & Kass (19991 and references 
therein. 

Hierarchical models lend themselves naturally to Gibbs 
or block-MCMC sampling, where at each step we draw a 
sample of each parameter in turn conditional on all others. 
For the current model, this means iteratively drawing sam¬ 
ples of the shear field and covariance matrix: 


C*+i ^ P(C|s*) 

8*+^ ^ P(s|C^d,N). (20) 


In order to build a Gibbs or block-MGMG sampler, then, 
we must know the conditional densities for each parameter 
given all others. The graphical model makes writing down 
these conditionals particularly straightforward; the condi¬ 
tional density for any parameter is given by the product 
of the conditional densities on all incoming and outgoing 
edges in the graph (upto a normalization constant). The 
conditional density of the shear map is hence: 


P(s|C,N,d) oc P(d|8,N)P(s|C) 


C (27r)-'''|CwF| 


„-5(s-dwF)^C3pp(s-dwF) 


( 21 ) 


where dwF = (C ^ -f N '^) ^N '^d is the Wiener filter 
of the data and the covariance Cwf — (C-i -f N-i)-^ 
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Similarly the conditional density of the signal covariance is 
given by: 




( 22 ) 


Since the covariance matrix is block diagonal, with each £m- 
mode contributing one block, and due to isotropy every m- 
mode for a given i has the same power, we can factorize the 
conditional on the covariance C into conditional distribu¬ 
tions on each ^-mode covariance C^, 


e 


P{Ce\s) 


|P^ I (2r-|-l)/2 II -(2r-|-2+p)/2 
2(2^+i)p/2rp(€-b I) 


-itr(C7ir,) 

(23) 


where Ti = and rp(-) is the multivariate gamma 

function. In the second line of Eq. (231 we recognise the con¬ 
ditional density of the Cr to be the inverse-Wishart distri¬ 
bution with support and i/t = 21 + 1 degrees-of-freedom, 
denoted by i/i)- 


Map sampling 


Sampling from the map conditional on the signal covariance 
involves generating a Gaussian random vector whose mean 
is the Wiener filter of the data dwF = (C“^ 
and with covariance Cwf = (C“^ + Whilst this is 

simple in principle, computing the Wiener filter involves in¬ 
verting the matrix (C“^-|-N~^) which has size N x N where 
N = 2 X ribins X n-pix for Ubins tomographic shear maps con¬ 
taining Upix pixels (and the factor of 2 is due to including 
both E- and B-mode degrees-of-freedom). Since in applica¬ 
tions the number of pixels is typically ripix ~ 10^ — 10® de¬ 
pending on the survey size and the angular scales of interest, 
this numerical matrix inversion is not computationally feasi¬ 
ble by brute force methods. Progress can be made if there ex¬ 
ists a basis where both C and N are sparse. For cosmic shear, 
the signal covariance is sparse in harmonic space whilst the 
noise covariance is sparse in pixel-space (for weakly or un¬ 
correlated pixel noise). In the idealised case where the pixel 
noise is both homogeneous and isotropic, N will be propor¬ 
tional to the identity matrix and is hence sparse (propor¬ 
tional to the identity) in any basis. The matrix inversion can 
then be performed in harmonic space where both the signal 
covariance and noise are now sparse. However, in practice 
the pixel noise will not be homogeneous and isotropic and 
it will not in general be possible to hnd a single basis where 
both C and N are sparse. For example, if the pixel noise 
is uncorrelated but the number of sources contributing to 
each pixel varies (as is inevitable in practice), N is no longer 
isotropic; it is diagonal in pixel space but will not be sparse 
in harmonic space. Pixel-pixel noise correlations would also 
result in N and C not being sparse in the same basis. In 
these cases, numerical implementations of the Wiener filter 
have traditionally relied on Krylov space methods, such as 
conjugate gradients, to solve the high-dimensional systems 
of equations (see e.g., Kitaura fc En61in|2008 and references 
therein). Recently a particularly elegant approach to solv¬ 
ing the Wiener filter equation was proposed, where an addi¬ 
tional messenger field is introduced to mediate between two 


different bases in which the signal and noise covariances are 
respectively sparse, bypassing the issue of directly invert¬ 
ing the high-dimensional matrices (Eisner fc Wandelt [2012 


2013 jjasche Lavaux 20151. We adopt this approach in 


§3.2|and apply it to simulated data in 


Power spectrum sampling 

In order to draw samples of the signal covariance for fixed 
shear field we simply need to generate inverse-Wishart dis¬ 
tributed random matrices with i/ = 2£+l degrees-of-freedom 
and support T(. Drawing samples from the inverse-Wishart 
distribution (T,v) can be straightforwardly performed 
as follows: 

(i) Generate v Gaussian random vectors ~ A/'(0, F"^). 

(ii) Gonstruct the sum of outer products of the vectors 
{xi}, i.e. X = 

(iii) Take the inverse of X, then X”'^ ~ as 

required. 


Survey Mask 

One of the major benehts of jointly inferring the shear map 
and power spectrum in a hierarchical setting is the ease with 
which masked regions can be accounted for. Masked pixels 
are treated as regions where the data provides no informa¬ 
tion, i.e., the noise covariance for masked pixels is taken to 
be infinite. However, this does not mean that we are to¬ 
tally ignorant about the field in those pixels. When gen¬ 
erating samples of the field for a hxed covariance, •(— 
P(s|C®,d,N) oc P(d|s,N)P(s|C®), there are two contribu¬ 
tions to the density P(s|C®,d,N); the data provides infor¬ 
mation about the field through the likelihood P(d|s, N) and 
the covariance provides additional information via P(s|C®). 
For pixels in which the noise covariance is infinite, all of the 
information comes from the prior on the held P(s|C®) at 
that sampling step, but nonetheless since the covariance is 
also being explored we are able to make inferences about 
the held inside the masked regions (albeit at lower signal- 
to-noise). 


3.2 Messenger field 

Sampling from the posterior distribution of the shear map 
and covariance described in §3.1| is conceptually simple and 
and can be reduced to drawing (multivariate) Gaussian and 
inverse-Wishart random variates. However, drawing samples 
of the map conditional on the signal covariance involves com¬ 
puting the Wiener hlter of the data and inverting the ma¬ 
trix (C~^ +N~^), which has size ~ ripix x ripix. If it is not 
possible to hnd a basis in which C and N are both sparse 
simultaneously (such as in the realistic case of anisotropic 
noise as discussed in |3.1[ ), this matrix inversion results in 
a formidable computational bottleneck. Fortunately, direct 
inversion of (C“^ -|- N“^) can be avoided by introducing an 
auxiliary Gaussian distributed messenger held t that medi¬ 
ates between the bases in which C and N are respectively 


sparse. This elegant idea was introduced by|Elsner & Wan- 

|delt|(|2012[|2013[) and further developed by 

Jasche & Lavaux 

120151; the approach taken here is close to 

Jasche & Lavaux 
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Figure 3. Hierarchical forward model for shear map-power spec¬ 
trum inference with the addition of a messenger field: the shear 
power spectrum is drawn from some prior distribution, a realiza¬ 
tion of the shear field is then generated given the power spectrum, 
isotropic noise with covariance T is added to give a realization of 
the messenger field t and finally an anisotropic noise component 
is added with covariance N to realize a noisy shear map d. 


(20151 but generalized to deal with non-diagonal signal co- 
variance matrices. 


The inclusion of the messenger field effectively allows us 
to split the noise N into two parts that are dealt with sep¬ 
arately: an isotropic part T = rl where r min [diag(N)], 
and the remaining anisotropic noise N = N — T. The gener¬ 
ative model for the data via the signal covariance C, shear 
field s and messenger field t is summarised in Fig. [^and is 
understood as follows: the signal covariance C generates a 
shear map, the shear map plus isotropic noise drawn from 
Af{0, T) generates a realisation of the messenger field t, and 
finally the messenger field plus an additional anisotropic 
noise component drawn from J\f{0, N) generates a realisa¬ 
tion of the data d. Importantly, note that the introduction 
of an additional level in the hierarchy separates the signal 
covariance C from the anisotropic noise covariance N, con¬ 
necting them only via T oc I which is diagonal in any basis; 
this is the essential function of the messenger field. 


The posterior and conditional distributions are: 


P(C,s,tjd,T,N) = 
P(d|t,N) = 


P(dit,N)P(t|s,T)P(s|C)P(C) 

P(d) 

_^-i{d-t)tN-l(d-t) 


P(S|C) = 


V(27r) 

1 


•|N| 


P(t|s,T) = 


yS2SW\C\ 
1 


y/(2r, 


-istc-l 


-i(t-s) + T''-{t-s) 


(24) 


Note that marginalising the posterior P(C,s,t|d) over the 
messenger field t recovers the joint posterior distribution for 
the shear field and signal covariance P(C,sjd) of Eq. (171, 
so sampling from the posterior P(C,s,tjd) and marginaliz¬ 
ing over s and t is completely equivalent to sampling from 
P(C,s|d) and marginalizing over s; both methods will gen¬ 
erate samples from the marginal posterior P(C|d) as de¬ 
sired. 

In order to block-MCMC or Gibbs sample from the pos¬ 
terior P(C, s, t|d) we need to iteratively draw samples from 
C, s and t conditional on all other parameters. 


C*+i ^ P(C|s'), 

^P(s|P,C\T), 

p+i ^P(t|s‘,d,N), (25) 

where the conditional densities (from the graph in Fig. 
are given by 


P(s|C,T,t) oc P(t|s,T)P(s|C) 

_ _ I _p-5(s-ria)tqa"^(s-pig) 

V(2^)^IQsl 

P(t|s, T, N, d) oc P(d|t, N)P(t!T, s) 

V(2^)^IQt| 

P{Ci\s)=W-\re,ut), (26) 


and the (conditional) shear field and messenger field means 
and covariances are given by, 

Ms = (C-^ + T-i)-iT-it, 

Qs = (C-^T-')-', 

Mt = (T~^ -b N"^)“^T~^s -b (T“^ + N“^)"^N“M, 

Qt = (T-i-bN"^)“\ (27) 


Crucially, note that the introduction of the messenger field 
has isolated the signal covariance C from the anisotropic 
noise N, where C and N now only appear in combination 
with the isotropic noise component T which is diagonal in 
any basis, since T = rl. Thus all of the necessary matrix 
inversions can now be performed in bases where the matrices 
are sparse, eliminating the need to solve the Wiener filter 
equation for high-dimensional dense matrices. 


Map sampling 

Sampling from the map conditional on the signal covariance 
and messenger field involves generating a Gaussian random 
vector with mean and covariance Ms ~ (C“^ -b 
and Qs = (C”'^ -b The noise component appearing 
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here is purely isotropic, T oc I, and hence T is diagonal 
in any orthogonal basis; the matrix Qs = (C“^ + 
is block-diagonal in harmonic space, with sub-blocks of size 
p = 2 X ribins, hence the largest matrix that has to be inverted 
during the map-sampling step is p x p. 


Messenger field sampling 

Sampling from the messenger held conditional on the 
shear held and the data involves generating a Gaus¬ 
sian random vector with mean and covariance = 
(T-i -I- N-i)-iT-is -f (T-i -b and Qt = 

If the pixel noise is weakly correlated, the 
anisotropic noise covariance N will be sparse in pixel-space, 
and in the limit where the noise can be assumed to be com¬ 
pletely uncorrelated N will be diagonal in the pixel-basis. 
Since T is diagonal in any basis by construction, inverting 
the matrix (T“^ + N~^) is now trivial in the pixel-basis. 
Miraculously, by splitting the noise into an isotropic and 
anisotropic component and introducing a messenger held to 
mediate between the data and the shear held, we have cre¬ 
ated an environment where C and N can be represented in 
bases in which they are respectively sparse simultaneously, 
eliminating the need to numerically invert ~ ripix x ripix ma¬ 
trices at each sampling step. 


Power spectrum sampling 

The conditional distribution for the signal covariance is 
unahected by the introduction of the messenger held into 
the hierarchy, so the power spectrum sampling step is 
again achieved by drawing inverse-Wishart random variates 


'li-l-l 


W as described in iS.l 


3.3 Cosmological parameter inference from the 
power spectrum and the Blackwell-Rao 
estimator 

The approach described in §3.2| allows us to jointly sample 
from the posterior P(C,s,tjd). Marginalizing over s and t 
we obtain samples from the marginal posterior distribution 
of the power spectrum (signal covariance) C given the data, 
i.e., 

P{C\d) = J P(C,s,t\d)dsdt. (28) 

Ultimately we are not interested in the power spectrum it¬ 
self, but rather what it can tell us about cosmological mod¬ 
els and parameters. Cosmological models provide a deter¬ 
ministic mapping between cosmological parameters and the 
shear power spectrum, i.e., for a given set of cosmological 
parameters G and model M we can compute C{G,M). This 
means that if we have access to the posterior P(C|d), we 
can straightforwardly sample the posterior distribution for 
0 in a given model M by drawing samples from 

9-.F(C((.,M)|d)p^Tg_, O) 

where the prior on the covariance C is effectively re¬ 
placed by a prior on 6 at this stage, through the ratio 
P{0)/P{C{G, M). In order to sample 0 we would like access 
to the full smooth posterior P(C|d), rather than just a set of 


Algorithm 1 Joint shear held-messenger field-power spec¬ 
trum Gibbs sampling. Here we make harmonic space and 
real space explicit, with harmonic space variables being de¬ 
noted with a tilde. F[-] represents the orthogonal transforma¬ 
tion from real space to harmonic E- and J?-mode coefficients, 
S) represents drawing from a multivariate Gaussian 
with mean fi and covariance S, and (r, i^) represents 
drawing from an Inverse-Wishart distribution with support 
r and v degrees-of-freedom as described in The organi¬ 
zation of the vectors and s are described by Eq. (101. 


Initialize parameter values: 

Draw C from some prior and initialize s; 

C ~ P(C) 

§-^■(0,0) 

s = [s] 

N Gibbs iterations: 
for i = 0 - 1 ^ N do 

Sample messenger field: 

Mean and Covariance: 

/Xt = (T-^ -f N-i)-^T-^s -b (T-i -b N-1)-1N-M 

Qt = (T-i -bN-^)-^ 

Draw Gaussian random variate: 
t ~ A/'(/Xt,Qt) 

Transform to harmonic space: 
t = F[t] 


Sample shear held: 
for all £, m modes do 

Mean and Covariance: 

Ps/m ~ 

Qsj = 

Draw Gaussian random variate: 

§rm ~ -^(Ps.rmi Qs.«) 

Transform to real space: 


Sample covariance matrix: 
for all I modes do 

Compute r^.- 
Fr = X]m,=-r 

Draw Inverse-Wishart random variate: 

~ w-i(r,,i/t) 


samples of the map and covariance {s*, C*}. The Blackwell- 
Rao estimator ( [Gelfand &: Smith|1990[ ) provides an efficient 
and very accurate way of estimating the smooth density 
P(C|d) from a set of samples of the held {s} = {s^,... ,s"}. 

The Blackwell-Rao estimator can be understood as fol¬ 
lows. When block-MCMG or Gibbs sampling the held and 
signal covariance, it is perfectly allowed to draw many sam¬ 
ples of C for each held sample. In the limit where we take 
a large number of covariance samples at every step, the co- 
variance histogram will become sufficiently smooth to allow 
us to estimate P(C|d) very well. The Blackwell-Rao estima¬ 
tor is the continuum limit of this idea; since we know the 
functional form of P(C|s) analytically, we can replace the 
covariance sampling step by the full analytical distribution. 
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The density P(C^|{si}) is then estimated by, 


P(C,|{s,})(x^^ 


|pi|(2<!+l)/2 

C,f(2<+2+p)72 


exp 


--tr(C7^r^) 


(30) 


This smooth density can then be used effectively for sam¬ 
pling e ~ P(C(0,M)|d)P(0)/P(C(6»,M)). In this work we 
draw samples from the posterior distribution of the tomo¬ 
graphic shear maps and power spectra as a proof-of-concept 
for hierarchical map-power spectrum inference for weak leas¬ 
ing, but note that application of the Blackwell-Rao esti¬ 
mator for extracting cosmological inferences (given sam¬ 
ples from the map-power spectrum posterior) is technically 
straightforward. For discussion of the Blackwell-Rao esti¬ 
mator in the context of CMB power spectrum inference see 
e.g., Wandelt, Larson & Lakshminarayanan (20041, and for 


application to CMB data see e.g., Chu et al. (20051. 


3.4 Intrinsic Alignments 

The intrinsic alignment (lA) of nearby galaxies, arising from 
their evolution in a shared tidal gravitational environment, 
is known to be an important systematic effect when extract¬ 
ing cosmological inferences from cosmic shear, i.e., from cor¬ 
relations of observed galaxy ellipticities across the sky (see 


e-g-, 

Joachimi et aL||20151 |Kirk et aL||20151 |Kiessling et al.| 

2015 

for detailed reviews). A careful analysis might include 


a parameterized model for intrinsic alignments, jointly in¬ 
ferring cosmological and intrinsic alignment model parame¬ 
ters simultaneously and marginalizing over the latter. It is 
straightforward to incorporate a model for intrinsic align¬ 
ments into the framework presented in this paper. In the 
presence of intrinsic alignments, the angular power spec¬ 
trum inferred from observed galaxy shapes will have contri¬ 
butions from both gravitational lensing and intrinsic align¬ 
ments, i.e., C = {0, M) -|- C^^(0, aiA, M), where we 

now distinguish between the cosmic shear power spectra 
M) (described in preceding sections) and the intrin¬ 
sic alignment power spectra C^^(0 , ccia, M) (containing all 
lA terms), which are functions of both the cosmology and an 
additional set of parameters ccia characterising the intrinsic 
alignment model. The inference process described in §3.1|3.2| 
achieves inference on the power spectrum C without refer¬ 
ence to any cosmological or intrinsic alignment model; the 
obtained posterior density P(C|d) describes the inference 
on the full angular power spectrum, containing (in princi¬ 
ple) both lensing and lA contributions. The joint posterior 
for the cosmological and intrinsic alignment model parame¬ 
ters can then be straightforwardly sampled, via P(C|d), by 
drawing samples from 

{0, aiA} ~ P(C(0,aiA, M)\d) (31) 

P(C(0, aiA, M)) 

analogously to Eq. ( |29[ ) and the description in ^3.3[ Having 
sampled the joint posterior P{d, Q:iA|d), the intrinsic align¬ 
ment nuisance parameters ckia can be easily marginalized 
over as desired. Performing power spectrum inference inde¬ 
pendent of any cosmological or intrinsic alignment model 
allows us to do parameter inference for different cosmolog¬ 
ical and lA models quickly and easily a posteriori, directly 
from the posterior distribution of C, without the need to 


re-analyse the full dataset. This is a major advantage for 
the method presented here. 


4 SIMULATIONS 


To demonstrate the joint map-power spectrum inference de¬ 
scribed in (including the messenger field sophistication), 
we apply the technique to simulated data. We generated a 
simulated shear catalogue using the SUNGLASS weak lensing 
simulation pipeline [Kiessling et al.||2011[ SUNGLASS gener¬ 
ates cosmological A-body simulations with 512® particles 
and a box of size length 512/i“®Mpc (with a mass resolu¬ 
tion of around 7.5 • lO^^M©) given a fixed ACDM cosmol¬ 
ogy using the gadget2 A-body code (Springel 20051. It 
then computes weak lensing effects along a lightcone by per¬ 
forming line-of-sight integrations in the Born approximation 
(with no radial binning). The computed weak lensing fields 
are then interpolated back onto the particles in the light- 


cone, generating a mock 3D shear catalogue (see Kiessling 


et al. 2011 for more details). We take a WMAP7 ACDM 
cosmology (Larson et al. 2011| Jarosik et al. 20111 with 


Qa = 0.73, ilm = 0.27, Qi = 0.045, Us = 0.96, as = 0.8 
and h = 0.71. The galaxy redshift distribution is given by 
p{z) oc , with zo = 0.7 corresponding to a me¬ 

dian redshift of around 1, the lightcone extends to a redshift 
of 2 and covers an area of lOdeg x lOdeg. Gaussian photo¬ 
metric redshift errors are added to each galaxy redshift, with 
zero mean and dispersion a^ = 0.05(1 -I- z). Each galaxy is 
also given an intrinsic random ellipticity, where the two ellip- 
ticity components are independently drawn from a Gaussian 
with zero mean and dispersion a^ = 0.27. Note that realis¬ 
tic ellipticity distributions are not expected to be Gaussian. 
However, since many galaxies are averaged together to give 
the estimated shear in each pixel, our results will be in¬ 
sensitive to the shape of the assumed intrinsic ellipticity 
distribution, depending only on the variance of the random 
ellipticity components due to the central limit theorem (in 
this study, ~ 330 galaxies contribute to the estimated shear 
in each pixel and tomographic bin). The result is a 3D cat¬ 
alogue of galaxies with angular positions, photometric red- 
shifts and observed ellipticities (given by the sum of the 
random intrinsic ellipticity and the shear for each source), 
covering lOdeg x lOdeg, a redshift range from z = 0 to z = 2, 
with an average of 30 galaxies per square arc minute on the 
sky. 


In order to perform a tomographic shear analysis, the 
catalogue is divided into two photometric redshift bins, with 
0 < Zph ^ 1 and 1 < Zph ^ 2 respectively. The sources in 
each redshift bin are then grouped into 128 x 128 angular pix¬ 
els and the estimated shear in each pixel is computed as the 
mean of the galaxy ellipticities in each of the pixels. A survey 
mask is added by masking out 5 circular patches positioned 
randomly over the survey area, each with radius ~ 0.8deg. 
The result is two 2D tomographic lOdeg x lOdeg noisy shear 
maps with a non-trivial survey mask, as described in j]2.2.1| 


5 RESULTS 

We applied the algorithm described in §3.2| and Algorithm 
to the simulated noisy tomographic shear maps gener¬ 
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Figure 4. Recovered F-mode tomographic shear power spectra: the inner orange bands indicate the 68% credible region, the outer grey 
bands indicate the 95% credible region, the black lines show the mean of the posterior distributions of the band powers, the red lines 
show the estimated band powers from the noiseless, mask-less simulated shear catalogue and the horizontal blue lines show indicate the 
mean ellipticity-noise level. 


ated using the sunglass simulation pipeline described in 
Since the simulated patch of sky is small we employ 
the fiat-sky approximations described in §2.2.3| It is highly 
desirable for the Gibbs sampler to bin several power spec- 


firstly, the number of model parameters is reduced bring¬ 
ing down the computational cost of each Gibbs sampling 
step. Secondly, in the implementation described in m the 
typical step size between two consecutive Gibbs samples is 
determined by the cosmic variance, whereas the full poste¬ 
rior has both cosmic variance and noise. As such, in the low 
signal-to-noise regime the sampler must make a large num¬ 
ber of steps to obtain two independent samples of the power 
spectrum coefficients. By binning £ modes together into a 
set of band-power coefficients, we are effectively increasing 
the signal-to-noise of the now reduced set of power spectrum 
coefficients, improving the sampling efficiency. To this end, i 
modes were binned together so that each bin contains ^ 80 
modes, giving a total of 194 band-power coefficients. Since 
there are two tomographic bins and two degrees-of-freedom 
for shear, this results in a total of 1940 model parameters 
for the power spectrum coefficients. The two tomographic 
shear maps contain 128 x 128 pixels, each with two degrees- 
of-freedom, so the pixels constitute a further 65,536 model 
parameters, bringing the total number of parameters in the 
inference task to 67,476. 

We ran three Gibbs chains with independent starting 


trum coefficients together (Larson et al.|2007 Eriksen et al. 


20061. This improves the sampling efficiency in two ways: 


points, each with 1.2 million steps. Convergence was deter¬ 


mined using the Gelman-Rubin statistic r ( Gelman fc Ru- 
bin|1992 I; the chains were deemed to have converged to the 


marginal distributions for all parameters, with r < 1.1 in all 
cases. Each sampling step required a clock-time of ~ 0.5 
seconds on a high-end 2015 desktop CPU. The obtained 
samples from the posterior distribution for the tomographic 
shear maps and power spectra are summarised in Fig. |4]|^ 
the inner orange bands show the 68% credible regions, the 
outer grey bands show the 95% credible regions, the black 
lines show the mean posterior band powers, the red lines 
show the estimated band powers from the noiseless (mask¬ 
less) simulated shear catalogue and the horizontal blue lines 
show the (average) pixel noise level due to random galaxy 
ellipticities. The mean and variance of the posterior distri¬ 
bution of the shear maps are shown alongside the simulated 
maps in Eig.j^ Obtained (smoothed) posterior distributions 
for the power spectrum coefficients for selected Gmodes are 
shown in Fig. 


Fig . I^shows the recovered E-mode tomographic (cross) 
power spectra for the two tomographic bins; the posterior 
distributions for the power spectra summarized by the or¬ 
ange (68%) and grey (95%) credible regions and black line 
(posterior mean) are clearly recovering the underlying power 
spectra in the simulation (red line). The posterior mean fol¬ 
lows the underlying simulated power (albeit with some scat¬ 
ter). Since the number of modes in each f-bin are chosen 
to be roughly the same, the cosmic variance is roughly the 
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Figure 5. Recovered S-mode tomographic shear power spectra: the inner orange bands indicate the 68% credible region, the outer 
grey bands indicate the 95% credible region, the red lines show the estimated band powers from the noiseless, mask-less simulated 
shear catalogue and the horizontal blue lines show indicate the average ellipticity-noise level. The recovered B mode power spectra are 
consistent with the small B-mode signal in the simulated data. For the small amplitude of S-modes in the simulation (1-2 orders of 
magnitude below the ellipticity noise level), the posteriors are also consistent with zero, effectively providing upper limits only. In order 
to make stronger statements about the S-mode power at this low signal-to-noise, one must either motivate a more informative prior or 
a more informative model for the S-modes. 


same for all of the band power coefficients. The increase in 
the posterior width with increasing i is hence due entirely 
to the decreasing signal-to-noise, since the amplitude of the 
F^-mode power spectra decreases as a function of i whilst 
the noise remains constant (i.e., independent of scale). 

Fig. [5] shows the recovered B-mode tomographic power 
spectra. The posterior distributions for the B-mode power 
(for the auto-bins) are consistent with the small signal in the 
simulated data. Due to the small amplitude of the B-mode 
power (a factor of 10-100 below the ellipticity noise level) 
the B-mode posteriors are also consistent with zero, effec¬ 
tively providing upper limits only. In order to make stronger 
statements about the B-modes (at this low signal-to-noise), 
one must either motivate a more informative prior or choose 
a more informative model for the B-mode power. In the ab¬ 
sence of prior information or understanding of the B-mode 
signal, Inference with an (uninformative) Jeffrey’s prior, as 
chosen here, is appropriate. When zero B-modes are ex¬ 
pected from cosmology and the B-mode power is used purely 
as a test of systematlcs, one could perform Bayesian model 
selection on a model with both E- and B-modes versus a 
model with B-modes only, in order to make stronger state¬ 
ments about the presence of residual B-modes in the data. 
The Bayesian hierarchical modelling approach provides a 
natural framework for such model selection. The recovered 
posterior inference on the B-mode cross power between the 


two tomographic bins is also consistent with the data (and 
consistent with zero). 

Fig.j^shows the recovered EB cross tomographic power 
spectra for the two tomographic bins. We expect zero cor¬ 
relation between E- and B-modes due to parity, and indeed 
all of the recovered EB power spectra are consistent with 
zero as expected. 

Fig .[^ shows the mean and variance of the posterior dis¬ 
tributions for the maps for the 71 component of the complex 
shear, alongside the noiseless (mask-less) simulated shear 
maps and the simulated maps with noise and mask added. 
The mean posterior maps are clearly recovering the struc¬ 
ture in the underlying simulated shear maps. Importantly, 
note that Inference of the shear field in the masked regions 
is obtained as described in © but the posterior variance in 
these regions is significantly higher than in the unmasked 
regions as one would expect, since the data provides no di¬ 
rect information about the field in those pixels; informa¬ 
tion is only obtained through the power spectrum inference 
(which provides prior information on the field at each sam¬ 
pling step) combined with the inferred field in pixels sur¬ 
rounding the masked regions. 

Fig .|^ shows the obtained posterior distributions for the 
Ce coefficients for selected Bmodes for B-mode, B-mode and 
BB-cross power (for the first tomographic bin only). For 
the B-modes, the posterior distributions are clearly non- 
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Figure 6. Recovered EB-cross tomographic shear power spectra: the inner orange bands indicate the 68% credible region, the outer 
grey bands indicate the 95% credible region, the black lines show the mean of the posterior distribution of the band powers and the 
red lines show the estimated band powers from the noiseless, mask-less simulated shear catalogue. All recovered EB cross powers are 
consistent with zero, as expected. 


Gaussian showing significant negative skewness; the method 
has captured the full non-Gaussian shape of the posterior 
distributions of the power spectrum coefficients. The asym¬ 
metry increases with increasing £ as the signal-to-noise (per 
mode) decreases. For the B-modes, where the signal-to-noise 
is very low across the full range of i, the posteriors are 
peaked at (or close to) zero with positive tails; these will 
be somewhat prior dependent, although we note that at 
very low signal-to-noise this behaviour may be expected for 
a range of (weakly informative) priors. We reiterate that 
in order to make stronger statements about the B-modes 
at this low signal-to-noise, one must provide a (motivated) 
more informative prior or more informative model. 


6 CONCLUSIONS 

We have developed a Bayesian hierarchical approach to cos¬ 
mic shear power spectrum inference, whereby we sample 
from the joint posterior distribution of the shear map and 
power spectrum in a Gibbs sampling framework. Similar 
methods have been developed and applied for power spec¬ 
trum inference in the context of CMB temperature, polar¬ 
ization and large-scale structure problems. The joint map- 
power spectrum inference approach has a number of highly 
desirable features. Complicated survey geometry and masks 
are trivially accounted for in contrast to approximate power 
spectrum estimation methods such as the pseudo-C^ ap¬ 
proach, since the inference of the power spectrum and un¬ 


masked pixels provides prior constraints on the field in the 
masked regions where the data provides no information di¬ 
rectly. The posterior distribution on the shear power spec¬ 
trum is a powerful intermediate product for a cosmic shear 
analysis, since it allows cosmological parameter inference 
and model selection to be performed without loss of infor¬ 
mation for a wide range of models a posteriori-, this is clearly 
preferable to analysing the full data set for each model in¬ 
dependently. The method is exact and optimal under the 
assumption of Gaussian fields, in the sense that no infor¬ 
mation is lost from the data once it has been pre-processed 
into estimated noisy shear maps. Loss of information and 
biases in this pre-processing step (from raw pixel data to es¬ 
timated galaxy shapes and redshifts) can be avoided by em¬ 
bedding the hierarchical map-power spectrum model into a 
larger global hierarchical model for cosmic shear accounting 
for instrumental effects, modelling distributions of intrinsic 
galaxy properties and cosmology together in a single global 
analysis. 

We demonstrate the method by performing tomo¬ 
graphic map-power spectrum inference on a simulated shear 
catalogue of galaxies with random intrinsic ellipticities, dis¬ 
tributed across the sky and in redshift (with photo-a er¬ 
rors), processed into two tomographic, pixelized, noisy shear 
maps, with a complicated survey mask. The simulation cov¬ 
ers 10 deg X 10 deg, extends to redshift z = 2 and contains 
on average 30 galaxies per square arc minute. The map- 
power spectrum inference approach produces posterior dis- 
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Figure 7. Tomographic maps of the 71 component of the shear for the noiseless simulated shear maps (far left), noisy masked simulated 
maps (second from left), mean posterior maps (third from left) and the posterior variance (far right). The mean posterior maps recover 
most of the structure from the simulated shear maps. Note that inference is made about the field in masked regions, but the posterior 
variance in those regions is significantly higher than in unmasked regions as expected. 


tributions for the shear map and tomographic power spectra, 
successfully recovering the simulated i?-mode, B-mode and 
EB-ctoss tomographic power spectra from the simulation. 


The Bayesian map-power spectrum inference approach 
can be straightforwardly extended to 3D shear power spec¬ 
trum inference, joint shear-magnification analysis ([Heaven^ 


Alsing & Jaffe 2013 Alsing et al. 20151 or joint analysis 


of weak lensing with other probes (galaxy clustering, CMB 
etc). Non-Gaussianity of the shear field on small angular 
scales means the Gaussian distribution for the shear field 
taken in this work will be sub-optimal when extracting infer¬ 
ences on these scales. The framework developed here can in 
principle be extended to jointly estimate the power spectrum 
and higher-order statistics for the shear field, capturing more 
of the available information on non-linear scales. This would 
require specifying a model for the non-Gaussian shear prob¬ 
ability density; the appearance of this non-Gaussian proba¬ 
bility density in the hierarchical model will likely make the 
conditional distributions too complicated to permit Gibbs 
sampling. In this case an alternative sampling scheme, such 
as Hamiltonian Monte Carlo (HMC), should be developed 
(see e.g., Jasche fc Kitaura|2010 for a successful application 
of non-Gaussian field inference using HMC in the context 
of large-scale structure). We will investigate these issues in 
future work. 
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